On Early Stopping in Gradient Descent Boosting
نویسندگان
چکیده
In this paper, we study a family of gradient descent algorithms to approximate the regression function from reproducing kernel Hilbert spaces. Here early stopping plays a role of regularization, where given a finite sample and some regularity condition on the regression function, a stopping rule is given and some probabilistic upper bounds are obtained for the distance between the function iterated at the stopping time and the regression function. A crucial advantage over other regularized least square algorithms studied recently lies in that it breaks through the saturation phenomenon where the convergence rate no longer improves after a certain level of regularity achieved by the regression function. These upper bounds show that in some situations we can achieve optimal convergence rates. We also discuss the implication of these results in the context of classification. Some connections are addressed with the Landweber iteration for regularization in inverse problems and the online learning algorithms as stochastic approximations of gradient descent method.
منابع مشابه
On Early Stopping in Gradient Descent Learning
In this paper, we study a family of gradient descent algorithms to approximate the regression function from Reproducing Kernel Hilbert Spaces (RKHSs), the family being characterized by a polynomial decreasing rate of step sizes (or learning rate). By solving a bias-variance trade-off we obtain an early stopping rule and some probabilistic upper bounds for the convergence of the algorithms. Thes...
متن کاملSparse Boosting
We propose Sparse Boosting (the SparseL2Boost algorithm), a variant on boosting with the squared error loss. SparseL2Boost yields sparser solutions than the previously proposed L2Boosting by minimizing some penalized L2-loss functions, the FPE model selection criteria, through smallstep gradient descent. Although boosting may give already relatively sparse solutions, for example corresponding t...
متن کاملStochastic Particle Gradient Descent for Infinite Ensembles
The superior performance of ensemble methods with infinite models are well known. Most of these methods are based on optimization problems in infinite-dimensional spaces with some regularization, for instance, boosting methods and convex neural networks use L1-regularization with the non-negative constraint. However, due to the difficulty of handling L1-regularization, these problems require ea...
متن کاملGeometry of Early Stopping in Linear Networks
A theory of early stopping as applied to linear models is presented. The backpropagation learning algorithm is modeled as gradient descent in continuous time. Given a training set and a validation set, all weight vectors found by early stopping must lie on a certain quadric surface, usually an ellipsoid. Given a training set and a candidate early stopping weight vector, all validation sets have...
متن کاملEarly Stopping as Nonparametric Variational Inference
We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy over these distributions during optimization, we form a scalable, unbiased estim...
متن کامل